Goto

Collaborating Authors

 language name


Towards Systematic Monolingual NLP Surveys: GenA of Greek NLP

Bakagianni, Juli, Pouli, Kanella, Gavriilidou, Maria, Pavlopoulos, John

arXiv.org Artificial Intelligence

Natural Language Processing (NLP) research has traditionally been predominantly focused on English, driven by the availability of resources, the size of the research community, and market demands. Recently, there has been a noticeable shift towards multilingualism in NLP, recognizing the need for inclusivity and effectiveness across diverse languages and cultures. Monolingual surveys have the potential to complement the broader trend towards multilingualism in NLP by providing foundational insights and resources necessary for effectively addressing the linguistic diversity of global communication. However, monolingual NLP surveys are extremely rare in literature. This study fills the gap by introducing a method for creating systematic and comprehensive monolingual NLP surveys. Characterized by a structured search protocol, it can be used to select publications and organize them through a taxonomy of NLP tasks. We include a classification of Language Resources (LRs), according to their availability, and datasets, according to their annotation, to highlight publicly-available and machine-actionable LRs. By applying our method, we conducted a systematic literature review of Greek NLP from 2012 to 2022, providing a comprehensive overview of the current state and challenges of Greek NLP research. We discuss the progress of Greek NLP and outline encountered Greek LRs, classified by availability and usability. As we show, our proposed method helps avoid common pitfalls, such as data leakage and contamination, and to assess language support per NLP task. We consider this systematic literature review of Greek NLP an application of our method that showcases the benefits of a monolingual NLP survey. Similar applications could be regard the myriads of languages whose progress in NLP lags behind that of well-supported languages.


Fumbling in Babel: An Investigation into ChatGPT's Language Identification Ability

Chen, Wei-Rui, Adebara, Ife, Doan, Khai Duy, Liao, Qisheng, Abdul-Mageed, Muhammad

arXiv.org Artificial Intelligence

Recently, ChatGPT has emerged as a powerful NLP tool that can carry out several tasks. However, the range of languages ChatGPT can handle remains largely a mystery. In this work, we investigate ChatGPT's language identification abilities. For this purpose, we compile Babel-670, a benchmark comprising $670$ languages representing $23$ language families. Languages in Babel-670 run the gamut between the very high-resource to the very low-resource and are spoken in five continents. We then study ChatGPT's (both GPT-3.5 and GPT-4) ability to (i) identify both language names and language codes (ii) under both zero- and few-shot conditions (iii) with and without provision of label set. When compared to smaller finetuned language identification tools, we find that ChatGPT lags behind. Our empirical analysis shows the reality that ChatGPT still resides in a state of potential enhancement before it can sufficiently serve diverse communities.


Translating with Google Sheets

#artificialintelligence

Google Translate is an amazing feat of engineering, which uses artificial intelligence to translate speech and text from a chosen language into another. In most cases, Google Translate's own interface embedded in Google Search or on translate.google.com Again, as in other Case Studies presented here, Google Sheets comes to the rescue! Other than formatting the file to your liking, you can create some drop-down lists for the Source and Target languages. This will help you being more productive as you do not need to search for the language codes every time you want to change them. In my case, I used the Data Validation feature using as a criterion a List from a Range.